Illness Burden in Daycare Attendees

Introduction

Human bocavirus (HBoV-1), a DNA virus first identified in 2005 (Allander et al. 2005), has been frequently detected in young children experiencing acute respiratory tract illness. HBoV-1 has been detected in up to 18% of samples from children with respiratory illness (T. Jartti et al. 2012). Over 85% of children in the United States have antibodies to this virus by 4 years of age (Kahn et al. 2008), yet HBoV-1 is rarely detected in adults (T. Jartti et al. 2012). Furthermore, HBoV-1 is often detected in children without evidence of disease and is often identified alongside with other respiratory viruses associated with acute illness (T. Jartti et al. 2012, Martin et al. (2008), Fry et al. (2007), Allander et al. (2007), Christensen et al. (2010)). Thus, retrospective or cross-sectional analyses designed to make an inference on disease progression, incidence and persistence are problematic. The purpose of this analysis is in identifying trends regarding persistence and latency of human bocavirus 1 in a pediatric cohort.

Data Source

Data for the current study comes from previously reported prospective studies of human herpesvirus 6 (HHV-6) natural history and human bocavirus-1 shedding events from a group of infants (Zerr et al. 2005), (Martin et al. 2015). Incidence of HHV-6 has been reported between 64-83% in the United States and in some studies up to 90% prevalence has been reported (Braun, Dominguez, and Pellett 1997). For this reason, the entire cohort of children carried HHV-6 at one point in the 2-year study period.

Study Population

Inclusion criteria for enrollment included pregnancy of the mother, receipt of care at a Seattle area-based obstetrical practice, and provision of informed consent prior to participation. Children were then followed from birth for up to two years between April 1997 and August 2003. No further exclusion criteria were applied within the birth cohort.

Clinical Data Collection

Demographic data were collected from the mother at enrollment through administered surveys. Symptom data were collected using a daily diary recorded by the mother and/or father. Symptoms included: fever (temperature >38.0° C), roseolla cough, runny-nose, vomiting, diarrhea, rash, fussiness above the baseline level for the child, seizure, and physician visit for illness. Data was further collected on breast-feeding, breast-feeding duration, group child-care, daycare attendance and duration and playgroup attendance.

Methods

Infection episodes were defined as at least two weeks of RT-PCR positive HBoV-1 infection for entry and exit with no more than 1 interrim negative specimen within the episode. The first episode was termed primary with all others defined as secondary, tertiary, quaternary, and quinary. For purposes of analyses all secondary and subsequent infection episodes were categorized as ‘secondary’.

Libraries

library(tidyverse)
library(magrittr)
library(tableone)
library(ggthemes)

Load Data

setwd("~/Documents/Martin Lab/HBov-1/illness_burden_dc/Rdata")
# setwd("S:/MartinEpi/Analysis/Bocavirus Persistence Modeling/Rdata/")

files <- list.files(pattern = "*.rds")
file_names <- gsub(".rds", "", files)
file_names <- gsub("_", ".", file_names)
for (i in 1:length(files)) {
  assign(file_names[i], readRDS(files[i]))
}
rm(files, file_names)

Data processing

ids <- data.frame(id = unique(main$id))
to_factor <- c("pos", "female", "race", "race_text", "family.income",
               "siblings", "breast_fed", "day_care", "seizure_hx", 
               "pg_tmp")
floor_week <- function(x) lubridate::floor_date(x, unit = "week")
to_week <- c("maternal.dob", "infant.dob", "stopdate",
             "daycare.start", "daycare.end",
             "play.start", "play.end", "last.fu.visit")

demo <- main %>%
  mutate(
    id = as.character(id),
    race_text = as.character(race_text),
    race_text = ifelse(race_text %in% c("white,asian", "nat amer", "hispanic"),
                       "other", race_text),
    race_text = ifelse(race_text == "", "unknown", race_text),
    race = ifelse(race %in% c(3, 4, 5), 3, race),
    last.fu.visit = as.Date(last.f.u.visit, "%d-%b-%y"),
    pg_tmp = ifelse(play.group == "Y", 1, 0),
    pg_tmp = ifelse(play.group == "unknown", NA, pg_tmp),
    mage.at.birth = as.numeric(difftime(infant.dob, maternal.dob, 
                                        units = "days") / 365),
    dc_dur = as.numeric(difftime(daycare.end, daycare.start,
                                 units = "weeks")),
    pg_dur = as.numeric(difftime(play.end, play.start,
                                 units = "weeks"))
    ) %>%
  mutate_at(vars(to_factor), funs(factor)) %>%
  mutate_at(vars(to_week), funs(floor_week)) %>%
  select(id, pos, female, mage.at.birth, maternal.dob:until.when,
         daycare.start, daycare.end, dc_dur,
         play.start, play.end, pg_tmp, pg_dur,
         bf_duration, siblings:seizure_hx, last.fu.visit) %>%
  rename(
    hbov.pos = pos,
    play.group = pg_tmp
    )

Setup time-dependent covariates: daycare and playgroup attendance. Age should probably be adjusted for too - maybe piecewise on age?

dc_dates <- demo %>%
  mutate(daycare.end = if_else(is.na(daycare.end), stopdate, daycare.end)) %>%
  select(id, daycare.start, daycare.end) %>%
  filter(!is.na(daycare.start))

pg_dates <- demo %>%
  select(id, play.start, play.end) %>%
  filter(!is.na(play.start))
dummy <- function(x) {
  x <- ifelse(x != 0, 1, 0)
  x
}

diary <- diaries %>%
  right_join(ids, by = "id") %>%
  filter(!is.na(diary)) %>%
  mutate(week = floor_week(date)) %>%
  select(id, week, fever, cough, runnynose) %>%
  group_by(id, week) %>%
  summarise_at(vars(fever, cough, runnynose), funs(sum(., na.rm = TRUE))) %>%
  ungroup() %>%
  mutate_at(vars(fever, cough, runnynose), funs(dummy)) %>%
  mutate(
    sum = rowSums(.[3:5]),
    ill = ifelse(sum >= 2, 1, 0)
  ) %>%
  group_by(id) %>%
  mutate(stdy_wk = row_number()) %>%
  ungroup() %>%
  # add daycare dates
  full_join(dc_dates, by = "id") %>%
  # add playgroup dates
  full_join(pg_dates, by = "id") %>%
  mutate(
    daycare = ifelse(week >= daycare.start & week <= daycare.end, 1, 0),
    daycare = ifelse(is.na(daycare), 0, daycare),
    playgrp = ifelse(week >= play.start & week <= play.end, 1, 0),
    playgrp = ifelse(is.na(playgrp), 0, playgrp)
  ) %>%
  select(-daycare.start, -daycare.end, -play.start, -play.end)

Ol’ Classic Table 1

Assess normality of continuous data:

nonnormal <- c("total.number.of.family", 
               "number.of.family.18.yrs",
               "dc_dur")
par(mfrow = c(1, length(nonnormal)))
for(i in 1:length(nonnormal)) {
  plot(density(demo[, nonnormal[i]], na.rm = TRUE),
       main = nonnormal[i])
}

# dput(colnames(demo))
vars <- c("hbov.pos", "female", "mage.at.birth", "race_text", 
          "total.number.of.family", "number.of.family.18.yrs", 
          "family.income", "siblings", "breast_fed", "bf_duration", 
          "day_care", "dc_dur", "play.group", "pg_dur", "seizure_hx")
tbl1 <- CreateTableOne(vars = vars, data = demo)
print(tbl1, nonnormal = nonnormal)
                                        
                                         Overall             
  n                                         86               
  hbov.pos = 1 (%)                          66 (76.7)        
  female = 1 (%)                            32 (37.2)        
  mage.at.birth (mean (sd))              33.99 (4.30)        
  race_text (%)                                              
     asian                                   9 (10.5)        
     black                                   3 ( 3.5)        
     other                                   7 ( 8.1)        
     unknown                                 5 ( 5.8)        
     white                                  62 (72.1)        
  total.number.of.family (median [IQR])   2.50 [2.00, 3.00]  
  number.of.family.18.yrs (median [IQR])  0.00 [0.00, 1.00]  
  family.income = 2 (%)                     69 (97.2)        
  siblings = 1 (%)                          41 (47.7)        
  breast_fed = 1 (%)                        84 (97.7)        
  bf_duration (mean (sd))                47.43 (25.04)       
  day_care = 1 (%)                          34 (39.5)        
  dc_dur (median [IQR])                  65.14 [41.00, 80.57]
  play.group = 1 (%)                        55 (64.7)        
  pg_dur (mean (sd))                     62.71 (30.01)       
  seizure_hx = 1 (%)                        14 (16.7)        
rm(tbl1)

Summarize panel data

sapply(diary[, c(3:7, 9, 10)], table)
$fever

   0    1 
7661  583 

$cough

   0    1 
6706 1538 

$runnynose

   0    1 
5537 2707 

$sum

   0    1    2    3 
5068 1766 1168  242 

$ill

   0    1 
6834 1410 

$daycare

   0    1 
6363 1881 

$playgrp

   0    1 
4950 3294 

Visualize illness chains

some_ids <- sample(unique(diary$id), 4, replace = FALSE)
diary %>%
  filter(id %in% some_ids) %>%
  select(id, stdy_wk, ill) %>%
  ggplot(aes(x = stdy_wk, y = ill)) %>%
  add(geom_point()) %>%
  add(geom_line()) %>%
  add(facet_wrap(~id, nrow = 2)) %>%
  add(scale_y_continuous(breaks = c(0, 1))) %>%
  add(theme_trueMinimal(16)) %>%
  add(labs(
    x = "\nStudy Week",
    y = "Recorded Illness\n"
  ))

some_ids <- sample(unique(diary$id), 4, replace = FALSE)
diary %>%
  filter(id %in% some_ids) %>%
  select(id, stdy_wk, ill) %>%
  ggplot(aes(x = stdy_wk, y = ill, color = id)) %>%
  add(geom_point()) %>%
  add(geom_line()) %>%
  add(scale_y_continuous(breaks = c(0, 1))) %>%
  add(theme_trueMinimal(16)) %>%
  add(labs(
    x = "\nStudy Week",
    y = "Recorded Illness\n"
  ))

Illness episodes/recorded are randomly distributed throughout the studied life history of these infants, as evidenced by the above vignettes.

Daycare attendance may modify illness frequency, the timeliness of daycare attendance of the infant lifecourse is more or less random:

some_ids <- sample(demo$id[demo$day_care == 1], 4)

dc_times <- diary %>%
  filter(id %in% some_ids & daycare == 1) %>%
  group_by(id) %>%
  mutate(first = row_number() == 1, last = row_number() == n()) %>%
  filter(first | last) %>%
  select(id, stdy_wk) %>%
  mutate(
    timevar = row_number(),
    timevar = ifelse(timevar == 1, "xmin", "xmax")
    ) %>%
  spread(key = timevar, value = stdy_wk)

diary %>%
  filter(id %in% some_ids) %>%
  ggplot(aes(x = stdy_wk, y = ill)) %>%
  add(geom_point()) %>%
  add(geom_line()) %>%
  add(geom_rect(data = dc_times, 
                aes(x = NULL, y = NULL,
                    xmin = xmin, 
                    xmax = xmax,
                    ymin = -Inf, 
                    ymax = Inf),
                alpha = 0.3)) %>%
  add(facet_wrap(~id)) %>%
  add(theme_trueMinimal()) %>%
  add(scale_y_continuous(breaks = c(0, 1))) %>%
  add(labs(
    x = "\nStudy Week",
    y = "ARI episodes\n",
    caption = "Period in grey denotes daycare attendance"
  ))

Modelling

Markov Approach

Identify illness episodes: negative (1), detection (2), and terminal (3) (states), first observation (firstobs).

states <- diary %>%
  group_by(id) %>%
  mutate(
    firstobs = as.integer(row_number() == 1),
    state = ill + 1,
    state = as.integer(ifelse(row_number() == n(), 3, state)),
    daycare = factor(daycare, 0:1, 0:1),
    playgrp = factor(playgrp, 0:1, 0:1)
    ) %>%
  rename(weeks = stdy_wk) %>%
  ungroup() %>%
  mutate(id = as.integer(id)) %>%
  arrange(id, weeks)

Tally observed movement between states and sojourn times

library(msm)
statetable.msm(state, id, data = states)
    to
from    1    2    3
   1 6013  676   68
   2  658  725   18

We have a three-state markov chain: illness - recovery - terminus, terminus being the end of the study. The Q, transition matrix is laid out like so:

\[ Q = \left(\begin{array}{cc} -(q_{12} + q_{13}) & q_{12} & q_{13} \\ q_{21} & -(q_{21} + q_{23}) & q_{23} \\ 0 & 0 & 0 \end{array}\right) \]

Which can be represented diagrammatically as,

library(diagram)

names <- c("ARI (-)", "ARI (+)", "END")
M <- matrix(nrow = 3, ncol = 3, byrow = TRUE, data = 0)
M[1, 1] <- paste(expression("q"[11]))
M[2, 1] <- paste(expression("q"[12]))
M[2, 2] <- paste(expression("q"[22]))
M[1, 2] <- paste(expression("q"[21]))
M[3, 1] <- paste(expression("q"[13]))
M[3, 2] <- paste(expression("q"[23]))
curves <- matrix(c(0, 0.1, 0, 0.1, 0, 0, 0, 0, 0), byrow = TRUE, nrow = 3)
plotmat(M,
        pos = c(2, 1),
        shadow.size = 0,
        curve = curves,
        name = names,
        lwd = 1,
        box.lwd = 2,
        cex.txt = 0.8,
        self.cex = 0.5,
        self.shiftx = c(-0.1, 0.1),
        self.shifty = c(0.1, 0.1),
        self.arrpos = c(1.6, 1.5),
        arr.type = "triangle")

Initial values could be set by supposing that transitions between states take place only at the observation times, fortunately the data we are working with can be said to have exact observation times. If we observe \(n_{rs}\) transitions from state \(r\) to state \(s\), and a total of \(n_{r}\) transitions from state \(r\), then \(q_{rs}/q_{rr}\) can be estimated by \(n_{rs}/n_{r}\). Then, given a total of \(T_{r}\) years spent in state \(r\), the mean sojourn time \(1/q_{rr}\) can be estimated as \(T_{r}/n_{r}\). Thus, \(n_{rs}/T_{r}\) is a crude estimate of \(q_{rs}\). The msm package provides a function crudeinits.msm for calculating initial values in this way.

Initialized Q matrix, entries generated above

Q <- matrix(c(0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0), 
            byrow = TRUE, nrow = 3, ncol = 3)

crudeQ <- crudeinits.msm(state ~ weeks, 
                         subject = id,
                         data = states,
                         qmatrix = Q)

crudeQ
           [,1]       [,2]       [,3]
[1,] -0.1101080  0.1000444 0.01006364
[2,]  0.4696645 -0.4825125 0.01284797
[3,]  0.0000000  0.0000000 0.00000000
# rowSums(crudeQ)
# 0 0 0 (row sums constrained to be zero)

Without Covariates

Running the model without covariates or censoring

markov1 <- msm(state ~ weeks,
               subject = id,
               deathexact = 3,
               data = states,
               qmatrix = crudeQ,
               gen.inits = TRUE)
markov1

Call:
msm(formula = state ~ weeks, subject = id, data = states, qmatrix = crudeQ,     gen.inits = TRUE, deathexact = 3)

Maximum likelihood estimates

Transition intensities
                  Baseline                      
State 1 - State 1 -0.160665 (-0.173949,-0.14840)
State 1 - State 2  0.151337 ( 0.139166, 0.16457)
State 1 - State 3  0.009328 ( 0.006555, 0.01328)
State 2 - State 1  0.706074 ( 0.649974, 0.76702)
State 2 - State 2 -0.722408 (-0.783604,-0.66599)
State 2 - State 3  0.016334 ( 0.007360, 0.03625)

-2 * log-likelihood:  7248.184 
[Note, to obtain old print format, use "printold.msm"]
qmatrix.msm(markov1)
        State 1                        State 2                       
State 1 -0.160665 (-0.173949,-0.14840)  0.151337 ( 0.139166, 0.16457)
State 2  0.706074 ( 0.649974, 0.76702) -0.722408 (-0.783604,-0.66599)
State 3 0                              0                             
        State 3                       
State 1  0.009328 ( 0.006555, 0.01328)
State 2  0.016334 ( 0.007360, 0.03625)
State 3 0                             
pmatrix.msm(markov1, ci = "normal", t = 1, cl = 0.95)
        State 1                     State 2                    
State 1 0.889956 (0.882873,0.89716) 0.100358 (0.093287,0.10748)
State 2 0.468226 (0.442186,0.49544) 0.517441 (0.490169,0.54200)
State 3 0                           0                          
        State 3                    
State 1 0.009687 (0.007553,0.01291)
State 2 0.014332 (0.008545,0.02768)
State 3 1.000000 (1.000000,1.00000)
sojourn.msm(markov1)
        estimates         SE        L        U
State 1  6.224122 0.25226171 5.748826 6.738714
State 2  1.384259 0.05742931 1.276154 1.501521

The interpretation of the estimated transition intensity matrix is somewhat interesting: the mean time in non-ill (non-ARI) state is 6.2 weeks (95% CI: 5.7 - 6.7 weeks) and the mean duration of ARI episodes is 1.4 weeks (95% CI: 1.3 - 1.5 weeks).

The total duration in any particular state can be estimated with bootstrapped errors:

# totals <- totlos.msm(markov1, 
#                      fromt = 0, 
#                      tot = 52, 
#                      ci = "bootstrap",
#                      B = 100, 
#                      cores = 4)

setwd("~/Documents/Martin Lab/HBov-1/illness_burden_dc/Rdata/output/")
# saveRDS(totals, "markov1_totals52.rds")
totals <- readRDS("markov1_totals52.rds")

totals
       State 1  State 2  State 3
res   33.24745 6.824620 11.92793
2.5%  31.52225 6.332389 10.00724
97.5% 34.79253 7.464456 13.86354

Within the first year of life, the estimated total number of weeks spent in ARI episodes is 6.8 weeks (95% CI: 6.3 - 7.4).

Each entry in the estimated transition probability matrix can be interpreted as the probability of occupying state s at time = origin + t, conditionally on occupying state r at time = origin. For t = 1 (1 weeks time), the probability of an ARI is 10.0% (95% CI: 9.3% - 10.8%). At 3 weeks after birth the probability of an ARI is 15.7% (95% CI: 14.6% - 16.8%). We can plot the estimated probability trajectory for 10 simulated weeks:

pmat_sim <- function(model,
                     iterations,
                     levels = NULL, 
                     labels = NULL) {
  require(dplyr)
  df_list <- list()
  for (i in 1:iterations) {
    pmat <- pmatrix.msm(markov1, ci = "normal", t = i, cl = 0.95)
    estimates <- as.data.frame(as.table(pmat$estimates))
    colnames(estimates) <- c("From", "To", "P_rs")
    UB <- as.data.frame(as.table(pmat$U))
    LB <- as.data.frame(as.table(pmat$L))
    estimates$UB <- UB$Freq
    estimates$LB <- LB$Freq
    estimates$time <- as.numeric(i)
    df_list[[i]] <- estimates
  } # end for
  df <- bind_rows(df_list)
  if (!is.null(levels)) {
    df <- df %>% 
      mutate(
        transition = paste(From, To, sep = " - "),
        transition = factor(transition,
                            levels = levels,
                            labels = labels)
      ) %>%
      select(-From, -To)
  } # end if
  return(df)
}

Qlevels <- paste("State", 1:ncol(Q))
Qlevels <- expand.grid(Qlevels, Qlevels)
Qlevels <- mutate(Qlevels, transition = paste(Var1, "-", Var2))[, 3]

# choose what you want to call the transitions
Qlabels <- c("ARI (-)", "ARI (+)", "END")
Qlabels <- expand.grid(Qlabels, Qlabels)
Qlabels <- mutate(Qlabels, transition = paste(Var1, "-", Var2))[, 3]

pmat <- pmat_sim(markov1, 10, levels = Qlevels, labels = Qlabels)

# this includes everything, as a check that the model is specified correctly
pmat %>%
  arrange(transition, time) %>%
  ggplot(aes(x = time)) %>%
  add(geom_line(aes(y = P_rs))) %>%
  add(geom_line(aes(y = UB), linetype = 2)) %>%
  add(geom_line(aes(y = LB), linetype = 2)) %>%
  add(facet_wrap(~transition, nrow = nrow(Q), ncol = ncol(Q))) %>%
  add(theme_base()) %>%
  add(labs(
    x = "\n Time since birth, weeks",
    y = expression(hat(P)[rs]),
    caption = "--- Gaussian 95% confidence bands"
  ))

All infants exit the study (END is an absorbing state), i.e. in the above plot, ARI (-) - END and ARI (+) - END are the same.

Probabilities for the first year of life (52 weeks)

setwd("~/Documents/Martin Lab/HBov-1/illness_burden_dc/Rdata/output/")
# pmat <- pmat_sim(markov1, 52, levels = Qlevels, labels = Qlabels)
# saveRDS(pmat, "markov1_pmat52.rds")
pmat <- readRDS("markov1_pmat52.rds")

pmat_red <- pmat[pmat$transition %in% Qlabels[c(1:2, 4:5)], ]
pmat_red$transition <- factor(pmat_red$transition, 
                              levels(pmat_red$transition)[c(1, 4, 2, 5)])

pmat_red %>%
  arrange(transition, time) %>%
  ggplot(aes(x = time)) %>%
  add(geom_line(aes(y = P_rs))) %>%
  add(geom_line(aes(y = UB), linetype = 2)) %>%
  add(geom_line(aes(y = LB), linetype = 2)) %>%
  add(facet_wrap(~transition, 
                 nrow = length(levels(pmat_red$transition)) / 2, 
                 ncol = length(levels(pmat_red$transition)) / 2 )) %>%
  add(theme_base()) %>%
  add(labs(
    x = "\n Time since birth, weeks",
    y = expression(hat(P)[rs]),
    caption = "--- Gaussian 95% confidence bands"
  ))

prevalence.msm(markov1)
$Observed
     State 1 State 2 State 3 Total
1         86       0       0    86
11.6      82       4       0    86
22.2      70      16       0    86
32.8      76      10       0    86
43.4      71      15       0    86
54        74      12       0    86
64.6      68      18       0    86
75.2      62      16       8    86
85.8      50      19      17    86
96.4      44      10      32    86
107        0       0      86    86

$Expected
      State 1   State 2   State 3 Total
1    86.00000  0.000000  0.000000    86
11.6 63.50574 13.499315  8.994946    86
22.2 56.78133 12.071490 17.147182    86
32.8 50.77009 10.793525 24.436381    86
43.4 45.39525  9.650854 30.953898    86
54   40.58942  8.629153 36.781430    86
64.6 36.29236  7.715616 41.992022    86
75.2 32.45022  6.898791 46.650988    86
85.8 29.01483  6.168441 50.816725    86
96.4 25.94314  5.515411 54.541451    86
107  23.19663  4.931514 57.871854    86

$`Observed percentages`
       State 1   State 2    State 3
1    100.00000  0.000000   0.000000
11.6  95.34884  4.651163   0.000000
22.2  81.39535 18.604651   0.000000
32.8  88.37209 11.627907   0.000000
43.4  82.55814 17.441860   0.000000
54    86.04651 13.953488   0.000000
64.6  79.06977 20.930233   0.000000
75.2  72.09302 18.604651   9.302326
85.8  58.13953 22.093023  19.767442
96.4  51.16279 11.627907  37.209302
107    0.00000  0.000000 100.000000

$`Expected percentages`
       State 1   State 2  State 3
1    100.00000  0.000000  0.00000
11.6  73.84388 15.696878 10.45924
22.2  66.02480 14.036617 19.93858
32.8  59.03499 12.550611 28.41440
43.4  52.78517 11.221923 35.99291
54    47.19700 10.033899 42.76910
64.6  42.20042  8.971646 48.82793
75.2  37.73282  8.021850 54.24533
85.8  33.73818  7.172606 59.08922
96.4  30.16644  6.413268 63.42029
107   26.97283  5.734319 67.29285
plot.prevalence.msm(markov1, legend.pos = c(0, 100))

The model does not fit the data well at all.

plot(seq(1, 107, length.out = 11), 
     as.numeric(prevalence.msm(markov1)$Observed[, 2]), 
     type = "l",
     xlab = "Infant Age (weeks)",
     ylab = "Prevalence (%)")
lines(seq(1, 107, length.out = 11), 
      as.numeric(prevalence.msm(markov1)$Expected[, 2]),
      type = "l", 
      lty = 2)

With Covariates

Sex

states_cov <- demo %>%
  select(id, hbov.pos, female, siblings) %>%
  mutate(id = as.integer(id)) %>%
  right_join(states, by = "id") %>%
  arrange(id, weeks)

markov2 <- msm(state ~ weeks,
               subject = id,
               covariates = ~female,
               deathexact = 3,
               data = states_cov,
               qmatrix = crudeQ,
               gen.inits = TRUE)
markov2

Call:
msm(formula = state ~ weeks, subject = id, data = states_cov,     qmatrix = crudeQ, gen.inits = TRUE, covariates = ~female,     deathexact = 3)

Maximum likelihood estimates
Baselines are with covariates set to their means

Transition intensities with hazard ratios for each covariate
                  Baseline                       female1              
State 1 - State 1 -0.160589 (-0.173874,-0.14832)                      
State 1 - State 2  0.151280 ( 0.139110, 0.16451) 0.9230 (0.7748,1.100)
State 1 - State 3  0.009309 ( 0.006528, 0.01327) 1.1123 (0.5455,2.268)
State 2 - State 1  0.706093 ( 0.649973, 0.76706) 0.9943 (0.8365,1.182)
State 2 - State 2 -0.722332 (-0.783558,-0.66589)                      
State 2 - State 3  0.016239 ( 0.007245, 0.03640) 0.8192 (0.1420,4.728)

-2 * log-likelihood:  7247.255 
[Note, to obtain old print format, use "printold.msm"]

Estimated transition intensity matrix and transition probability matrix for the first 30 weeks of infant life. For this infant cohort, Children first began daycare at a median age of 30 weeks (range: 6 - 87 weeks).

hazard.msm(markov2)
$female1
                         HR         L        U
State 1 - State 2 0.9230121 0.7747805 1.099603
State 1 - State 3 1.1123331 0.5454870 2.268221
State 2 - State 1 0.9943437 0.8364677 1.182017
State 2 - State 3 0.8192264 0.1419582 4.727671

Infant sex has no statistically significant bearing on rate of ARI incidence in this cohort (HR: 0.92, 95% CI: 0.77 - 1.10).

siblings

markov3 <- msm(state ~ weeks,
               subject = id,
               covariates = ~siblings,
               deathexact = 3,
               data = states_cov,
               qmatrix = crudeQ,
               gen.inits = TRUE)
markov3

Call:
msm(formula = state ~ weeks, subject = id, data = states_cov,     qmatrix = crudeQ, gen.inits = TRUE, covariates = ~siblings,     deathexact = 3)

Maximum likelihood estimates
Baselines are with covariates set to their means

Transition intensities with hazard ratios for each covariate
                  Baseline                       siblings1            
State 1 - State 1 -0.160776 (-0.174084,-0.14849)                      
State 1 - State 2  0.151488 ( 0.139293, 0.16475) 1.0848 (0.9170,1.283)
State 1 - State 3  0.009288 ( 0.006493, 0.01328) 0.8984 (0.4364,1.850)
State 2 - State 1  0.706881 ( 0.650673, 0.76795) 1.1450 (0.9700,1.352)
State 2 - State 2 -0.723221 (-0.784548,-0.66669)                      
State 2 - State 3  0.016340 ( 0.007331, 0.03642) 1.3671 (0.2770,6.747)

-2 * log-likelihood:  7245.263 
[Note, to obtain old print format, use "printold.msm"]
hazard.msm(markov3)
$siblings1
                         HR         L        U
State 1 - State 2 1.0848434 0.9169877 1.283425
State 1 - State 3 0.8984084 0.4363995 1.849538
State 2 - State 1 1.1449919 0.9699702 1.351595
State 2 - State 3 1.3670768 0.2769839 6.747321

Having siblings has no bearing on ARI incidence (HR: 1.08, 95% CI: 0.92 - 1.28)

Siblings and Sex

markov4 <- msm(state ~ weeks,
               subject = id,
               covariates = ~female + siblings + female * siblings,
               deathexact = 3,
               data = states_cov,
               qmatrix = crudeQ,
               gen.inits = TRUE)
hazard.msm(markov4)
$female1
                         HR         L         U
State 1 - State 2 0.8870437 0.6893740  1.141393
State 1 - State 3 1.0144727 0.3850980  2.672449
State 2 - State 1 0.9935760 0.7740258  1.275401
State 2 - State 3 1.2804658 0.1168769 14.028367

$siblings1
                         HR         L         U
State 1 - State 2 1.0739055 0.8691435  1.326907
State 1 - State 3 0.8018425 0.2910981  2.208711
State 2 - State 1 1.1616555 0.9430632  1.430915
State 2 - State 3 1.8862274 0.2781742 12.790020

$`female1:siblings1`
                         HR         L         U
State 1 - State 2 1.0557935 0.7416577  1.502984
State 1 - State 3 1.2727446 0.2893215  5.598889
State 2 - State 1 0.9671584 0.6824291  1.370685
State 2 - State 3 0.3890003 0.0110541 13.689143

One could argue that, given differences in behavior between males and females, contact with siblings could affect the rate of respiratory illnesses. Infants of this age may be too young to show these behavioral differences, explaining the lack of an effect here and in the model above (markov3).

Daycare Attendance

markov4 <- msm(state ~ weeks,
               subject = id,
               covariates = ~daycare,
               deathexact = 3,
               data = states_cov,
               qmatrix = crudeQ,
               gen.inits = TRUE)

hazard.msm(markov4)
$daycare1
                          HR            L            U
State 1 - State 2 1.72038574 1.435746e+00 2.061456e+00
State 1 - State 3 1.64811399 8.286008e-01 3.278152e+00
State 2 - State 1 0.73186096 6.127297e-01 8.741546e-01
State 2 - State 3 0.00121789 7.696874e-20 1.927089e+13

Daycare attendance positively affects rate of ARI: daycare attendees are 1.7 times more lkely (70%) to experience an ARI episode.

setwd("~/Documents/Martin Lab/HBov-1/illness_burden_dc/Rdata/output/")

# totals1 <- totlos.msm(markov4, 
#                       covariates = list(1), 
#                       ci = "bootstrap", 
#                       B = 100, 
#                       cores = 4)

# saveRDS(totals1, "markov4_totals1.rds")
totals1 <- readRDS("markov4_totals1.rds")

# totals0 <- totlos.msm(markov4, 
#                       covariates = list(0), 
#                       ci = "bootstrap", 
#                       B = 100, 
#                       cores = 4)

# saveRDS(totals0, "markov4_totals0.rds")
totals0 <- readRDS("markov4_totals0.rds")
totals1
        State 1  State 2 State 3
res    78.35131 31.21414     Inf
2.5%   54.24131 20.22914     Inf
97.5% 148.41818 59.94543     Inf

For those in daycare, the expected total duration of ARI episodes is 31.2 weeks, compare this with non-daycare attendees…

totals0
        State 1  State 2 State 3
res    78.34706 12.78565     Inf
2.5%   60.36012 10.39239     Inf
97.5% 107.03928 18.09093     Inf

where the estimated total duration of ARI episodes is 12.8 weeks (95% CI: 10.7 - 16.9).

Daycare and HBoV-1 shedding

markov5 <- msm(state ~ weeks,
               subject = id,
               covariates = ~daycare + hbov.pos + daycare * hbov.pos,
               deathexact = 3,
               data = states_cov,
               qmatrix = crudeQ,
               gen.inits = TRUE)

hazard.msm(markov5)
$daycare1
                         HR            L          U
State 1 - State 2 2.0477290 1.369384e+00   3.062102
State 1 - State 3 4.5063205 3.041483e-01  66.766528
State 2 - State 1 0.8874682 5.966144e-01   1.320115
State 2 - State 3 0.1050038 4.622524e-05 238.523226

$hbov.pos1
                         HR          L         U
State 1 - State 2 1.1532994 0.89338278  1.488835
State 1 - State 3 2.7043893 0.25205678 29.016167
State 2 - State 1 0.9814590 0.76592027  1.257653
State 2 - State 3 0.4084863 0.09944391  1.677942

$`daycare1:hbov.pos1`
                           HR            L            U
State 1 - State 2 0.802059912 5.110628e-01 1.258750e+00
State 1 - State 3 0.281076870 1.713320e-02 4.611176e+00
State 2 - State 1 0.785426469 5.035832e-01 1.225011e+00
State 2 - State 3 0.001871154 9.217153e-41 3.798589e+34

The carriage time of an ARI associated organism, Human Bocavirus 1, may confound the effect of daycare attendance on ARI incidence. Here, there is no significant interaction between daycare attendance and HBoV-1 shedding on the incidence of ARI (HR: 0.8, 95% CI: 0.51 - 1.26).

Playgroup Association

markov6 <- msm(state ~ weeks,
               subject = id,
               covariates = ~playgrp,
               deathexact = 3,
               data = states_cov,
               qmatrix = crudeQ,
               gen.inits = TRUE)

hazard.msm(markov6)
$playgrp1
                          HR            L          U
State 1 - State 2 1.22094116 1.031114e+00   1.445716
State 1 - State 3 0.92198155 4.570873e-01   1.859710
State 2 - State 1 1.01819422 8.618171e-01   1.202946
State 2 - State 3 0.04206069 2.261740e-06 782.185952

Infants in play groups (defined how?) are more at risk of incurring ARI episodes than infants not attending a play group by 20% (HR: 1.22, 95% CI: 1.03 - 1.45).

Playgroup and Daycare Attendance

The correlation of daycare attendance and playgroup attendance are weakly correlated, so I can put them both in the model:

tab <- xtabs(~day_care + play.group, data = demo)
vcd::assocstats(tab)
                   X^2 df P(> X^2)
Likelihood Ratio 6.172  1 0.012979
Pearson          6.215  1 0.012667

Phi-Coefficient   : 0.27 
Contingency Coeff.: 0.261 
Cramer's V        : 0.27 

Cramer’s V - a strength of correlation estimation technique for categorical data - is 0.27, weakly correlated. If we look at the data time-series view of the Cramer’s V, the correlation is very weak:

tab <- xtabs(~daycare + playgrp, data = states_cov)
vcd::assocstats(tab)
                    X^2 df P(> X^2)
Likelihood Ratio 78.545  1        0
Pearson          76.822  1        0

Phi-Coefficient   : 0.097 
Contingency Coeff.: 0.096 
Cramer's V        : 0.097 

Subject-wise, it varys:

tmp <- states_cov %>%
  group_by(id) %>%
  mutate(
    pg = as.numeric(playgrp) - 1,
    dc = as.numeric(daycare) - 1,
    sumpg = sum(pg),
    sumdc = sum(dc)
  ) %>%
  ungroup() %>%
  filter(sumpg != 0 & sumdc != 0) %>%
  select(id, playgrp, daycare)
# length(unique(tmp$id))
# only 15

fcor <- function(df) {
  tab <- xtabs(~daycare + playgrp, data = df)
  return(data.frame(COR = vcd::assocstats(tab)$cramer))
}

plyr::ddply(tmp, "id", fcor) %>%
  ggplot() %>%
  add(geom_linerange(aes(x = id, ymin = 0, ymax = COR))) %>%
  add(geom_hline(yintercept = 0.5, color = 'red')) %>%
  add(theme_trueMinimal())

Only 15 individuals attended a daycare and engaged in playgroup activities at the same time and the correlations were only strong for 5 infants.

markov7 <- msm(state ~ weeks,
               subject = id,
               covariates = ~playgrp + daycare + playgrp * daycare,
               deathexact = 3,
               data = states_cov,
               qmatrix = crudeQ,
               gen.inits = TRUE)

hazard.msm(markov7)
$playgrp1
                         HR         L        U
State 1 - State 2 1.3968723 1.1353162 1.718686
State 1 - State 3 0.6292762 0.2210001 1.791802
State 2 - State 1 1.0001795 0.8166636 1.224934
State 2 - State 3 0.1363798 0.0122035 1.524108

$daycare1
                          HR           L         U
State 1 - State 2 2.00674349 1.589964245 2.5327736
State 1 - State 3 0.92350047 0.279493354 3.0514254
State 2 - State 1 0.76711770 0.609676684 0.9652158
State 2 - State 3 0.06826926 0.001024963 4.5471826

$`playgrp1:daycare1`
                            HR             L             U
State 1 - State 2 0.7072435375  4.830695e-01  1.035448e+00
State 1 - State 3 3.9035879236  7.842447e-01  1.943016e+01
State 2 - State 1 0.8488092967  5.835198e-01  1.234709e+00
State 2 - State 3 0.0007048726 1.009882e-140 4.919838e+133

Attending daycare while also engaging in play groups during that same week has no statistically signficant affect on ARI incidence (HR: 0.71, 95% CI: 0.49 - 1.04), After adjustment, daycare attendance and playgroup engagement affect the rate of ARI with similar strengths of association and variance as their respective models, alone.

Changepoint Analysis

Infants enter daycare at the - seemingly random - behest of their guardians. As methods have not been widely developed for longitudinal data with random changepoints, the idea here is to base the timing of life on the day children enter daycare, day 0, and count backward and forward from there. The interest lies on entry into daycare and the experience while in daycare, not the exit and so the data have been truncated to the end of daycare attendance.

chng.df <- diary %>%
  filter(daycare == 1) %>%
  group_by(id) %>%
  mutate(
    first = row_number() == 1, 
    last = row_number() == n()
    ) %>%
  filter(first | last) %>%
  select(id, stdy_wk) %>%
  mutate(
    timevar = row_number(),
    timevar = ifelse(timevar == 1, "start", "finish")
    ) %>%
  ungroup() %>%
  spread(key = timevar, value = stdy_wk) %>%
  full_join(states_cov %>% mutate(id = as.character(id)), by = "id") %>%
  filter(!is.na(start) & weeks <= finish) %>%
  mutate(time = weeks - start) %>%
  select(-finish, -week, -sum, -weeks, -firstobs, -state)

The timing of the daycare attendance varies wildly…

chng.df %>%
  distinct(id, start) %>%
  ggplot(aes(x = start)) %>%
  add(geom_histogram(color = "black")) %>%
  add(theme_trueMinimal(16)) %>%
  add(labs(
    x = "Daycare Start, Week of Infant Life",
    y = "Frequency"
  ))

tmp <- chng.df %>% distinct(id, start)
psych::describe(tmp$start)
   vars  n  mean    sd median trimmed   mad min max range skew kurtosis
X1    1 34 31.24 23.07     24   28.71 17.79   6  87    81 0.88    -0.45
     se
X1 3.96

…ranging from 6 to 87 weeks. The median being 31 weeks.

The approach here is to model ARI incidence using a mixed-model and making a changepoint at t = 0. Mixed-models extend to piece-wise models simply through allowing a random/varying intercept and slope at the changepoint. The model used here is a bernoulli model with a logit link for ARI incidence,

\[ y = \left\{ \begin{array}{ll} 1 & \quad ARI (+) \\ 0 & \quad ARI (-) \end{array} \right. \]

The model takes this form,

\[\eta_{ij} = (\beta_0 + u_{0j}) + \beta Z_{ij} + e_{ij}\quad i=1,...,n, j = 1,...,m\] Where each of \(n\) individuals, \(i\), experience \(m\) time points, \(j\). \(u_{0j}\) denote random intercepts for each individual, \(\left\{Z_{ij}\right\}_{i=1}^n\) represent any covariates of interest and \(e_{ij}\) are residual errors for each individual. \(\eta_{ij}\) denotes the linear predictor which is related to \(Y\) (defined above) through the \(logit\) link \(g(E(y)) = \eta\),

\[\log \left\{\frac{E[y]}{1-E[y]}\right\} = \eta\]

The broken-stick at \(t=0\) comes in through an additional covariate,

\[\eta_{ij} = (\beta_0 + u_{0j}) + \beta Z_{ij} + \beta_cI(t \ge 0) + e_{ij}\]

Where \(\beta_c\) could take the form of a random intercept or slope.

Daycare + Start Age

library(brms)
library(rstan)
theme_set(theme_trueMinimal())

# brmmod1 <- brm(ill ~ time:daycare + start + (1 | id),
#                prior = c(set_prior ("normal (0, 5)", class = "b")),
#                data = chng.df,
#                family = bernoulli,
#                cores = 4)

setwd("~/Documents/Martin Lab/HBov-1/illness_burden_dc/Rdata/output")
# saveRDS(brmmod1, "brm_dc_start.rds")
brmmod1 <- readRDS("brm_dc_start.rds")

plot(brmmod1)

All chains mix well and parameters are identifiable

summary(brmmod1)
 Family: bernoulli(logit) 
Formula: ill ~ time:daycare + start + (1 | id) 
   Data: chng.df (Number of observations: 2909) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; 
         total post-warmup samples = 4000
    ICs: LOO = NA; WAIC = NA; R2 = NA
 
Group-Level Effects: 
~id (Number of levels: 34) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     0.56      0.09     0.40     0.76       1510 1.00

Population-Level Effects: 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept        -1.50      0.20    -1.89    -1.11       1649 1.00
start             0.01      0.01    -0.00     0.02       1913 1.00
time:daycare0     0.03      0.00     0.02     0.04       4000 1.00
time:daycare1     0.01      0.00     0.00     0.01       4000 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
exp(fixef(brmmod1))
               Estimate Est.Error   2.5%ile  97.5%ile
Intercept     0.2237293  1.221191 0.1513214 0.3285999
start         1.0082327  1.005029 0.9982279 1.0182705
time:daycare0 1.0344017  1.004938 1.0246628 1.0446055
time:daycare1 1.0070709  1.002295 1.0024738 1.0115797
pp_check(brmmod1)

Posterior predictive check shows a well fitting model

plot(marginal_effects(brmmod1), points = TRUE, newpage = FALSE, ask = FALSE)

The second plot doesn’t make much sense as daycare attendance before time = 0 cannot happen, by definition.

set1 <- RColorBrewer::brewer.pal(9, "Set1")

pframe <- data.frame(
  time = seq(-86, 91, by = 1),
  start = mean(chng.df$start)
  )

pframe <- transform(pframe, daycare = factor(ifelse(time >= 0, 1, 0)))

prediction <- data.frame(
  predict(brmmod1, 
          newdata = pframe, 
          re_formula = NA, 
          type = "response")
  )

pframe <- cbind(pframe, prediction)

colnames(pframe) <- c("time", "start", "daycare", "ill", "SE", "LB", "UB")

chng.df %>%
  ggplot(aes(time, ill, group = id)) %>%
  add(geom_line(alpha = 0.2)) %>%
  add(geom_line(data = pframe, aes(group = NA), color = set1[1])) %>%
  add(geom_vline(xintercept = 0, lty = 2)) %>%
  add(theme_trueMinimal(16)) %>%
  add(labs(
    x = "Standardized Study Time, Weeks",
    y = "Probability of ARI"
  ))

The red-line is the prediction from the model, with a changepoint at 0 for time of daycare entrance.

Daycare + Start Age + Sex

# brmmod2 <- brm(ill ~ time:daycare + start + female + (1 | id),
#                prior = c(set_prior ("normal (0, 5)", class = "b")),
#                data = chng.df,
#                family = bernoulli,
#                cores = 4)

setwd("~/Documents/Martin Lab/HBov-1/illness_burden_dc/Rdata/output")
# saveRDS(brmmod2, "brm_dc_start_sex.rds")
brmmod2 <- readRDS("brm_dc_start_sex.rds")

summary(brmmod2)
 Family: bernoulli(logit) 
Formula: ill ~ time:daycare + start + female + (1 | id) 
   Data: chng.df (Number of observations: 2909) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; 
         total post-warmup samples = 4000
    ICs: LOO = NA; WAIC = NA; R2 = NA
 
Group-Level Effects: 
~id (Number of levels: 34) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     0.57      0.10     0.41     0.80       1374 1.00

Population-Level Effects: 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept        -1.51      0.22    -1.94    -1.09       1698 1.00
start             0.01      0.01    -0.00     0.02       1790 1.00
female1           0.03      0.24    -0.41     0.51       1421 1.00
time:daycare0     0.03      0.01     0.02     0.04       4000 1.00
time:daycare1     0.01      0.00     0.00     0.01       4000 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
exp(fixef(brmmod2))
               Estimate Est.Error   2.5%ile  97.5%ile
Intercept     0.2209524  1.240369 0.1432784 0.3350945
start         1.0081302  1.005360 0.9970919 1.0187651
female1       1.0317392  1.267428 0.6614249 1.6713275
time:daycare0 1.0345032  1.005061 1.0244529 1.0450096
time:daycare1 1.0070713  1.002297 1.0024698 1.0116028

Daycare + Start Age + siblings

# brmmod3 <- brm(ill ~ time:daycare + start + siblings + (1 | id),
#                prior = c(set_prior ("normal (0, 5)", class = "b")),
#                data = chng.df,
#                family = bernoulli,
#                cores = 4)

setwd("~/Documents/Martin Lab/HBov-1/illness_burden_dc/Rdata/output")
# saveRDS(brmmod3, "brm_dc_start_sibs.rds")
brmmod3 <- readRDS("brm_dc_start_sibs.rds")

summary(brmmod3)
 Family: bernoulli(logit) 
Formula: ill ~ time:daycare + start + siblings + (1 | id) 
   Data: chng.df (Number of observations: 2909) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; 
         total post-warmup samples = 4000
    ICs: LOO = NA; WAIC = NA; R2 = NA
 
Group-Level Effects: 
~id (Number of levels: 34) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     0.52      0.09     0.37     0.73       1166 1.00

Population-Level Effects: 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept        -1.31      0.22    -1.74    -0.88       1770 1.00
start             0.01      0.00    -0.00     0.02       1816 1.00
siblings1        -0.45      0.21    -0.87    -0.05       1584 1.00
time:daycare0     0.03      0.01     0.02     0.04       4000 1.00
time:daycare1     0.01      0.00     0.00     0.01       4000 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
exp(fixef(brmmod3))
               Estimate Est.Error   2.5%ile  97.5%ile
Intercept     0.2704896  1.243354 0.1757173 0.4149754
start         1.0090654  1.004921 0.9990140 1.0188748
siblings1     0.6396194  1.230377 0.4197987 0.9559420
time:daycare0 1.0343807  1.005085 1.0243958 1.0450990
time:daycare1 1.0072342  1.002327 1.0026006 1.0117267
pframe1 <- data.frame(time = seq(-86, 91, by = 1),
                      start = mean(chng.df$start),
                      siblings = 1)

pframe0 <- data.frame(time = seq(-86, 91, by = 1), 
                      start = mean(chng.df$start),
                      siblings = 0)

pframe1 <- transform(pframe1, daycare = factor(ifelse(time >= 0, 1, 0)))
pframe0 <- transform(pframe0, daycare = factor(ifelse(time >= 0, 1, 0)))

prediction1 <- data.frame(
  predict(brmmod3, 
          newdata = pframe1, 
          re_formula = NA, 
          type = "response")
  )

prediction0 <- data.frame(
  predict(brmmod3, 
          newdata = pframe0, 
          re_formula = NA, 
          type = "response")
  )

pframe1 <- cbind(pframe1, prediction1)
pframe0 <- cbind(pframe0, prediction0)

names <- c("time", "start", "siblings", "daycare", "ill",  "SE",  "LB", "UB")
colnames(pframe1) <- names
colnames(pframe0) <- names

chng.df %>%
  ggplot(aes(time, ill, group = id)) %>%
  add(geom_line(alpha = 0.2)) %>%
  add(geom_line(data = pframe1, aes(group = NA, color = "HH Siblings"))) %>%
  add(geom_line(data = pframe0, aes(group = NA, color = "No HH Siblings"))) %>%
  add(geom_vline(xintercept = 0, lty = 2)) %>%
  add(theme_trueMinimal(16)) %>%
  add(theme(legend.position = "top")) %>%
  add(labs(
    x = "Standardized Study Time, Weeks",
    y = "Probability of ARI"
  )) %>%
  add(scale_color_manual(
    name = "",
    values = c(set1[1], set1[2])
  ))

Daycare + Start Age + HBoV-1 carry

# brmmod4 <- brm(ill ~ time:daycare + start + hbov.pos + (1 | id),
#                data = chng.df,
#                family = bernoulli,
#                cores = 4)

setwd("~/Documents/Martin Lab/HBov-1/illness_burden_dc/Rdata/output")
# saveRDS(brmmod4, "brm_dc_start_boca.rds")
brmmod4 <- readRDS("brm_dc_start_boca.rds")

summary(brmmod4)
 Family: bernoulli(logit) 
Formula: ill ~ time:daycare + start + hbov.pos + (1 | id) 
   Data: chng.df (Number of observations: 2909) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; 
         total post-warmup samples = 4000
    ICs: LOO = NA; WAIC = NA; R2 = NA
 
Group-Level Effects: 
~id (Number of levels: 34) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     0.57      0.10     0.41     0.79       1398 1.00

Population-Level Effects: 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept        -1.56      0.31    -2.19    -0.96       1769 1.00
start             0.01      0.01    -0.00     0.02       1649 1.00
hbov.pos1         0.07      0.26    -0.43     0.60       1585 1.00
time:daycare0     0.03      0.01     0.02     0.04       4000 1.00
time:daycare1     0.01      0.00     0.00     0.01       4000 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
exp(fixef(brmmod4))
               Estimate Est.Error   2.5%ile  97.5%ile
Intercept     0.2102375  1.357167 0.1122201 0.3816667
start         1.0084224  1.005403 0.9976832 1.0192708
hbov.pos1     1.0774424  1.299533 0.6486571 1.8146452
time:daycare0 1.0344311  1.005127 1.0244060 1.0449316
time:daycare1 1.0070952  1.002309 1.0025900 1.0116842

Daycare + Start Age + Play Group Activity

# brmmod5 <- brm(ill ~ time:daycare + start + playgrp + (1 | id),
#                data = chng.df,
#                family = bernoulli,
#                cores = 4)

setwd("~/Documents/Martin Lab/HBov-1/illness_burden_dc/Rdata/output")
# saveRDS(brmmod5, "brm_dc_start_pg.rds")
brmmod5 <- readRDS("brm_dc_start_pg.rds")

summary(brmmod5)
 Family: bernoulli(logit) 
Formula: ill ~ time:daycare + start + playgrp + (1 | id) 
   Data: chng.df (Number of observations: 2909) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; 
         total post-warmup samples = 4000
    ICs: LOO = NA; WAIC = NA; R2 = NA
 
Group-Level Effects: 
~id (Number of levels: 34) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     0.58      0.10     0.42     0.80       1385 1.00

Population-Level Effects: 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept        -1.48      0.21    -1.88    -1.07       1151 1.01
start             0.00      0.01    -0.01     0.01       1291 1.00
playgrp1          0.43      0.18     0.09     0.77       2371 1.00
time:daycare0     0.03      0.01     0.02     0.04       4000 1.00
time:daycare1     0.01      0.00     0.00     0.01       4000 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
exp(fixef(brmmod5))
               Estimate Est.Error   2.5%ile  97.5%ile
Intercept     0.2272134  1.233920 0.1519002 0.3422014
start         1.0037257  1.005679 0.9924607 1.0147402
playgrp1      1.5298619  1.192808 1.0904038 2.1694060
time:daycare0 1.0332169  1.005141 1.0230385 1.0442349
time:daycare1 1.0057872  1.002392 1.0011596 1.0104781
pframe1 <- data.frame(time = seq(-86, 91, by = 1),
                      start = mean(chng.df$start),
                      playgrp = 1)

pframe0 <- data.frame(time = seq(-86, 91, by = 1), 
                      start = mean(chng.df$start),
                      playgrp = 0)

pframe1 <- transform(pframe1, daycare = factor(ifelse(time >= 0, 1, 0)))
pframe0 <- transform(pframe0, daycare = factor(ifelse(time >= 0, 1, 0)))

prediction1 <- data.frame(
  predict(brmmod5, 
          newdata = pframe1, 
          re_formula = NA, 
          type = "response")
  )

prediction0 <- data.frame(
  predict(brmmod5, 
          newdata = pframe0, 
          re_formula = NA, 
          type = "response")
  )

pframe1 <- cbind(pframe1, prediction1)
pframe0 <- cbind(pframe0, prediction0)

names <- c("time", "start", "playgrp", "daycare", "ill",  "SE",  "LB", "UB")
colnames(pframe1) <- names
colnames(pframe0) <- names

chng.df %>%
  ggplot(aes(time, ill, group = id)) %>%
  add(geom_line(alpha = 0.2)) %>%
  add(geom_line(data = pframe1, aes(group = NA, color = "Playgroup"))) %>%
  add(geom_line(data = pframe0, aes(group = NA, color = "No Playgroup"))) %>%
  add(geom_vline(xintercept = 0, lty = 2)) %>%
  add(theme_trueMinimal(16)) %>%
  add(theme(legend.position = "top")) %>%
  add(labs(
    x = "Standardized Study Time, Weeks",
    y = "Probability of ARI"
  )) %>%
  add(scale_color_manual(
    name = "",
    values = c(set1[1], set1[2])
  ))

Daycare + Start Age + Random Slope

# brmmod6 <- brm(ill ~ time:daycare + start + (1 + time | id),
#                data = chng.df,
#                family = bernoulli,
#                cores = 4)

setwd("~/Documents/Martin Lab/HBov-1/illness_burden_dc/Rdata/output")
# saveRDS(brmmod6, "brm_dc_start_ranslope.rds")
brmmod6 <- readRDS("brm_dc_start_ranslope.rds")

summary(brmmod6)
 Family: bernoulli(logit) 
Formula: ill ~ time:daycare + start + (1 + time | id) 
   Data: chng.df (Number of observations: 2909) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; 
         total post-warmup samples = 4000
    ICs: LOO = NA; WAIC = NA; R2 = NA
 
Group-Level Effects: 
~id (Number of levels: 34) 
                    Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)           0.52      0.12     0.31     0.77       1358 1.01
sd(time)                0.03      0.01     0.02     0.04        777 1.01
cor(Intercept,time)    -0.16      0.28    -0.63     0.43        416 1.01

Population-Level Effects: 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept        -1.49      0.21    -1.91    -1.08       2105 1.00
start             0.01      0.01    -0.00     0.02       1652 1.00
time:daycare0     0.06      0.01     0.04     0.08       1706 1.00
time:daycare1     0.00      0.01    -0.01     0.01       1597 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
exp(fixef(brmmod6))
               Estimate Est.Error   2.5%ile  97.5%ile
Intercept     0.2246804  1.234746 0.1476483 0.3394174
start         1.0092786  1.005370 0.9988433 1.0196657
time:daycare0 1.0574747  1.010074 1.0382016 1.0791380
time:daycare1 1.0025987  1.005723 0.9909871 1.0132603
pframe <- data.frame(
  time = seq(-86, 91, by = 1),
  start = mean(chng.df$start)
  )

pframe <- transform(pframe, daycare = factor(ifelse(time >= 0, 1, 0)))

prediction <- data.frame(
  predict(brmmod1, 
          newdata = pframe, 
          re_formula = NA, 
          type = "response")
  )

pframe <- cbind(pframe, prediction)

colnames(pframe) <- c("time", "start", "daycare", "ill", "SE", "LB", "UB")

chng.df %>%
  ggplot(aes(time, ill, group = id)) %>%
  add(geom_line(alpha = 0.2)) %>%
  add(geom_line(data = pframe, aes(group = NA), color = set1[1])) %>%
  add(geom_vline(xintercept = 0, lty = 2)) %>%
  add(theme_trueMinimal(16)) %>%
  add(labs(
    x = "Standardized Study Time, Weeks",
    y = "Probability of ARI"
  ))

Daycare + Start Age + Sex + Random Slope

# brmmod7 <- brm(ill ~ time:daycare + start + female + (1 + time | id),
#                prior = c(set_prior ("normal (0, 5)", class = "b")),
#                data = chng.df,
#                family = bernoulli,
#                cores = 4)

setwd("~/Documents/Martin Lab/HBov-1/illness_burden_dc/Rdata/output")
# saveRDS(brmmod7, "brm_dc_start_sex_rans.rds")
brmmod7 <- readRDS("brm_dc_start_sex_rans.rds")

summary(brmmod7)
 Family: bernoulli(logit) 
Formula: ill ~ time:daycare + start + female + (1 + time | id) 
   Data: chng.df (Number of observations: 2909) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; 
         total post-warmup samples = 4000
    ICs: LOO = NA; WAIC = NA; R2 = NA
 
Group-Level Effects: 
~id (Number of levels: 34) 
                    Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)           0.53      0.12     0.33     0.80       1396 1.00
sd(time)                0.03      0.01     0.02     0.04        758 1.00
cor(Intercept,time)    -0.15      0.27    -0.65     0.41        434 1.00

Population-Level Effects: 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept        -1.50      0.22    -1.93    -1.05       1483 1.00
start             0.01      0.01    -0.00     0.02       1125 1.00
female1          -0.02      0.24    -0.50     0.44       1849 1.00
time:daycare0     0.06      0.01     0.04     0.08       1506 1.00
time:daycare1     0.00      0.01    -0.01     0.01       1338 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
exp(fixef(brmmod7))
               Estimate Est.Error   2.5%ile  97.5%ile
Intercept     0.2239861  1.242098 0.1455808 0.3494455
start         1.0094413  1.005880 0.9980914 1.0213499
female1       0.9803284  1.268980 0.6062712 1.5535620
time:daycare0 1.0574447  1.010290 1.0378150 1.0802672
time:daycare1 1.0023899  1.006104 0.9895731 1.0140825

Daycare + Start Age + Siblings + Random Slope

# brmmod8 <- brm(ill ~ time:daycare + start + siblings + (1 + time | id),
#                data = chng.df,
#                family = bernoulli,
#                cores = 4)

setwd("~/Documents/Martin Lab/HBov-1/illness_burden_dc/Rdata/output")
# saveRDS(brmmod8, "brm_dc_start_sibs_rans.rds")
brmmod8 <- readRDS("brm_dc_start_sibs_rans.rds")

summary(brmmod8)
 Family: bernoulli(logit) 
Formula: ill ~ time:daycare + start + siblings + (1 + time | id) 
   Data: chng.df (Number of observations: 2909) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; 
         total post-warmup samples = 4000
    ICs: LOO = NA; WAIC = NA; R2 = NA
 
Group-Level Effects: 
~id (Number of levels: 34) 
                    Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)           0.56      0.13     0.33     0.85        989 1.00
sd(time)                0.03      0.01     0.02     0.04        645 1.00
cor(Intercept,time)    -0.27      0.28    -0.73     0.34        329 1.01

Population-Level Effects: 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept        -1.35      0.27    -1.87    -0.81        685 1.00
start             0.01      0.01    -0.00     0.02        722 1.00
siblings1        -0.22      0.25    -0.72     0.27        998 1.00
time:daycare0     0.06      0.01     0.04     0.08       1604 1.00
time:daycare1     0.00      0.01    -0.01     0.01       1500 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
exp(fixef(brmmod8))
               Estimate Est.Error   2.5%ile  97.5%ile
Intercept     0.2597251  1.308037 0.1548643 0.4468224
start         1.0082161  1.005725 0.9965307 1.0192626
siblings1     0.8045366  1.286465 0.4847968 1.3162066
time:daycare0 1.0577143  1.010279 1.0380981 1.0799853
time:daycare1 1.0020653  1.005967 0.9898552 1.0131747

Daycare + Start Age + HBoV-1 carry + Random Slopes

# brmmod9 <- brm(ill ~ time:daycare + start + hbov.pos + (1 + time | id),
#                data = chng.df,
#                family = bernoulli,
#                cores = 4)

setwd("~/Documents/Martin Lab/HBov-1/illness_burden_dc/Rdata/output")
# saveRDS(brmmod9, "brm_dc_start_boca_rans.rds")
brmmod9 <- readRDS("brm_dc_start_boca_rans.rds")

summary(brmmod9)
 Family: bernoulli(logit) 
Formula: ill ~ time:daycare + start + hbov.pos + (1 + time | id) 
   Data: chng.df (Number of observations: 2909) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; 
         total post-warmup samples = 4000
    ICs: LOO = NA; WAIC = NA; R2 = NA
 
Group-Level Effects: 
~id (Number of levels: 34) 
                    Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)           0.52      0.12     0.31     0.79       1442 1.00
sd(time)                0.03      0.01     0.02     0.04        879 1.00
cor(Intercept,time)    -0.13      0.29    -0.61     0.46        341 1.00

Population-Level Effects: 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept        -1.66      0.31    -2.27    -1.06       2253 1.00
start             0.01      0.01    -0.00     0.02       1392 1.00
hbov.pos1         0.19      0.25    -0.29     0.70       2425 1.00
time:daycare0     0.06      0.01     0.04     0.08       1802 1.00
time:daycare1     0.00      0.01    -0.01     0.01       1438 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
exp(fixef(brmmod9))
               Estimate Est.Error   2.5%ile  97.5%ile
Intercept     0.1899273  1.358411 0.1031074 0.3456686
start         1.0101632  1.005617 0.9987768 1.0208606
hbov.pos1     1.2124909  1.287219 0.7501993 2.0127399
time:daycare0 1.0577131  1.010201 1.0386537 1.0802391
time:daycare1 1.0023537  1.005846 0.9906958 1.0136870

Daycare + Start Age + Play Group Activity

# brmmod10 <- brm(ill ~ time:daycare + start + playgrp + (1 | id),
#                 data = chng.df,
#                 family = bernoulli,
#                 cores = 4)

setwd("~/Documents/Martin Lab/HBov-1/illness_burden_dc/Rdata/output")
# saveRDS(brmmod10, "brm_dc_start_pg_rans.rds")
brmmod10 <- readRDS("brm_dc_start_pg_rans.rds")

summary(brmmod10)
 Family: bernoulli(logit) 
Formula: ill ~ time:daycare + start + playgrp + (1 | id) 
   Data: chng.df (Number of observations: 2909) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; 
         total post-warmup samples = 4000
    ICs: LOO = NA; WAIC = NA; R2 = NA
 
Group-Level Effects: 
~id (Number of levels: 34) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     0.58      0.10     0.42     0.80       1385 1.00

Population-Level Effects: 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept        -1.48      0.21    -1.88    -1.07       1151 1.01
start             0.00      0.01    -0.01     0.01       1291 1.00
playgrp1          0.43      0.18     0.09     0.77       2371 1.00
time:daycare0     0.03      0.01     0.02     0.04       4000 1.00
time:daycare1     0.01      0.00     0.00     0.01       4000 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
exp(fixef(brmmod10))
               Estimate Est.Error   2.5%ile  97.5%ile
Intercept     0.2272134  1.233920 0.1519002 0.3422014
start         1.0037257  1.005679 0.9924607 1.0147402
playgrp1      1.5298619  1.192808 1.0904038 2.1694060
time:daycare0 1.0332169  1.005141 1.0230385 1.0442349
time:daycare1 1.0057872  1.002392 1.0011596 1.0104781
pframe1 <- data.frame(time = seq(-86, 91, by = 1),
                      start = mean(chng.df$start),
                      playgrp = 1)

pframe0 <- data.frame(time = seq(-86, 91, by = 1), 
                      start = mean(chng.df$start),
                      playgrp = 0)

pframe1 <- transform(pframe1, daycare = factor(ifelse(time >= 0, 1, 0)))
pframe0 <- transform(pframe0, daycare = factor(ifelse(time >= 0, 1, 0)))

prediction1 <- data.frame(
  predict(brmmod10, 
          newdata = pframe1, 
          re_formula = NA, 
          type = "response")
  )

prediction0 <- data.frame(
  predict(brmmod10, 
          newdata = pframe0, 
          re_formula = NA, 
          type = "response")
  )

pframe1 <- cbind(pframe1, prediction1)
pframe0 <- cbind(pframe0, prediction0)

names <- c("time", "start", "playgrp", "daycare", "ill",  "SE",  "LB", "UB")
colnames(pframe1) <- names
colnames(pframe0) <- names

chng.df %>%
  ggplot(aes(time, ill, group = id)) %>%
  add(geom_line(alpha = 0.2)) %>%
  add(geom_line(data = pframe1, aes(group = NA, color = "Playgroup"))) %>%
  add(geom_line(data = pframe0, aes(group = NA, color = "No Playgroup"))) %>%
  add(geom_vline(xintercept = 0, lty = 2)) %>%
  add(theme_trueMinimal(16)) %>%
  add(theme(legend.position = "top")) %>%
  add(labs(
    x = "Standardized Study Time, Weeks",
    y = "Probability of ARI"
  )) %>%
  add(scale_color_manual(
    name = "",
    values = c(set1[1], set1[2])
  ))

Misc. Functons

# true minimal theme
theme_trueMinimal <- function(base_size = 16, base_family = "") {
    theme_bw(base_size = base_size, base_family = base_family) %+replace% 
        theme(axis.ticks = element_blank(), 
              legend.background = element_blank(), 
              legend.key = element_blank(), 
              panel.background = element_blank(), 
              strip.background = element_blank(), 
              panel.border = element_blank(),
              panel.grid.major = element_blank(), 
              panel.grid.minor = element_blank(),
              plot.background = element_blank(), 
              axis.line = element_line(), 
              axis.line.y = element_blank(),
              complete = TRUE)
}

References

Allander, Tobias, Tuomas Jartti, Shawon Gupta, Hubert G. M. Niesters, Pasi Lehtinen, Riikka Üsterback, Tytti Vuorinen, et al. 2007. “Human Bocavirus and Acute Wheezing in Children.” Clinical Infectious Diseases 44 (7): 904–10. doi:10.1086/512196.

Allander, Tobias, Martti T. Tammi, Margareta Eriksson, Annelie Bjerkner, Annika Tiveljung-Lindell, and Björn Andersson. 2005. “Cloning of a Human Parvovirus by Molecular Screening of Respiratory Tract Samples.” Proceedings of the National Academy of Sciences of the United States of America 102 (36): 12891–6. doi:10.1073/pnas.0504666102.

Braun, D K, G Dominguez, and P E Pellett. 1997. “Human Herpesvirus 6.” Clinical Microbiology Reviews 10 (3): 521–67. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC172933/.

Christensen, Andreas, Svein Arne Nordbø, Sidsel Krokstad, Anne Gro Wesenberg Rognlien, and Henrik Døllner. 2010. “Human Bocavirus in Children: Mono-Detection, High Viral Load and Viraemia Are Associated with Respiratory Tract Infection.” Journal of Clinical Virology 49 (3): 158–62. doi:10.1016/j.jcv.2010.07.016.

Fry, Alicia M., Xiaoyan Lu, Malinee Chittaganpitch, Teresa Peret, Julie Fischer, Scott F. Dowell, Larry J. Anderson, Dean Erdman, and Sonja J. Olsen. 2007. “Human Bocavirus: A Novel Parvovirus Epidemiologically Associated with Pneumonia Requiring Hospitalization in Thailand.” The Journal of Infectious Diseases 195 (7): 1038–45. doi:10.1086/512163.

Jartti, Tuomas, Klaus Hedman, Laura Jartti, Olli Ruuskanen, Tobias Allander, and Maria Söderlund-Venermo. 2012. “Human Bocavirus—the First 5 years.” Reviews in Medical Virology 22 (1): 46–64. doi:10.1002/rmv.720.

Kahn, Jeffrey S., Deniz Kesebir, Susan F. Cotmore, Anthony D’Abramo, Christi Cosby, Carla Weibel, and Peter Tattersall. 2008. “Seroepidemiology of Human Bocavirus Defined Using Recombinant Virus-Like Particles.” The Journal of Infectious Diseases 198 (1): 41–50. doi:10.1086/588674.

Martin, Emily T., Jane Kuypers, Judson Heugel, and Janet A. Englund. 2008. “Clinical Disease and Viral Load in Children Infected with Respiratory Syncytial Virus or Human Metapneumovirus.” Diagnostic Microbiology and Infectious Disease 62 (4): 382–88. doi:10.1016/j.diagmicrobio.2008.08.002.

Martin, Emily T., Jane Kuypers, John P. McRoberts, Janet A. Englund, and Danielle M. Zerr. 2015. “Human Bocavirus 1 Primary Infection and Shedding in Infants.” The Journal of Infectious Diseases 212 (4): 516–24. doi:10.1093/infdis/jiv044.

Zerr, Danielle M., Amalia S. Meier, Stacy S. Selke, Lisa M. Frenkel, Meei-Li Huang, Anna Wald, Margaret P. Rhoads, et al. 2005. “A Population-Based Study of Primary Human Herpesvirus 6 Infection.” New England Journal of Medicine 352 (8): 768–76. doi:10.1056/NEJMoa042207.

November, 20, 2017